Skip to content

Conversation

@KJ7LNW
Copy link
Contributor

@KJ7LNW KJ7LNW commented Apr 9, 2025

Context

This PR enhances the tree-sitter parsers for multiple languages to improve code navigation and analysis capabilities, specifically targeting languages tested in the cte/evals repository.

Implementation

Enhanced tree-sitter parsers for:

  • TypeScript/JavaScript (TSX)
  • C++
  • Go
  • Java
  • Python

Each language parser has been updated to support a comprehensive range of language constructs, significantly improving the ability to extract and navigate code definitions.

Benefits for Evaluation Testing

These enhancements will provide better code navigation and understanding when working with evaluation exercises from https://github.com/cte/evals, which contains exercises for all the enhanced languages.

Testing Recommendation

It would be beneficial to test an eval series with "File read auto-truncate threshold" set to zero. This configuration will provide a rich set of program definition line numbers, reducing context and providing additional focus without distraction from content that is no longer relevant.

Get in Touch

Discord: KJ7LNW

cc: @cte


Important

Enhances tree-sitter parsers for multiple languages, improving support for various language constructs and adding comprehensive tests and queries.

  • Behavior:
    • Enhances tree-sitter parsers for TypeScript/JavaScript (TSX), C++, Go, Java, and Python to support a wide range of language constructs.
    • Improves code navigation and analysis capabilities for these languages.
  • Testing:
    • Adds comprehensive test files for each language: parseSourceCodeDefinitions.cpp.test.ts, parseSourceCodeDefinitions.go.test.ts, parseSourceCodeDefinitions.java.test.ts, parseSourceCodeDefinitions.python.test.ts, and parseSourceCodeDefinitions.tsx.test.ts.
    • Tests cover various language constructs such as class, function, and method declarations, as well as language-specific features like decorators, templates, and comprehensions.
  • Queries:
    • Updates query files for each language: cpp.ts, go.ts, java.ts, python.ts, tsx.ts, and typescript.ts.
    • Queries are enhanced to capture a broader set of language constructs, including advanced features like async functions, decorators, and template literals.

This description was created by Ellipsis for c2dda05. It will automatically update as commits are pushed.

Eric Wheeler added 5 commits April 8, 2025 19:22
- Enhanced the Tree-Sitter parser for JavaScript/TypeScript with support for advanced language constructs
- Modified the parser to exclude comments from the output
- Consolidated sample code in tests for better maintainability

Signed-off-by: Eric Wheeler <[email protected]>
This enhancement significantly expands the C++ parser's capabilities to recognize and extract a wide range of modern C++ language constructs, improving code navigation and analysis.

New supported language constructs include:
- Union declarations and their members
- Destructors and their implementations
- Operator overloading (including stream operators)
- Free-standing and namespace-scoped functions
- Enum declarations (both traditional and scoped enum class)
- Lambda expressions and their captures
- Attributes and annotations
- Method overrides with virtual/override specifiers
- Exception specifications (noexcept)
- Default parameters in function declarations
- Variadic templates and parameter packs
- Structured bindings (C++17)
- Inline namespaces and nested namespace declarations
- Template specializations and instantiations
- Constructor implementations

This enhancement provides more comprehensive code structure analysis for C++ codebases, particularly those using modern C++ features from C++11, C++14, and C++17 standards.

Signed-off-by: Eric Wheeler <[email protected]>
This enhancement significantly expands the Go parser's capabilities to recognize and extract a comprehensive set of language constructs:

- Added support for struct and interface definitions with proper type identification
- Implemented parsing for constant declarations (both single and in blocks)
- Added support for variable declarations (both single and in blocks)
- Added recognition of type aliases with proper distinction from regular types
- Implemented special handling for init functions
- Added support for anonymous functions, including nested function literals
- Improved documentation and organization of query patterns

These enhancements enable more accurate code navigation, better symbol extraction, and improved code intelligence for Go codebases.

Signed-off-by: Eric Wheeler <[email protected]>
This enhancement significantly expands the Java parser's capabilities to recognize and parse a wide range of Java language constructs:

- Added support for enum declarations and enum constants
- Added support for annotation type declarations and elements
- Added support for field declarations
- Added support for constructor declarations
- Added support for lambda expressions
- Added support for inner and anonymous classes
- Added support for type parameters (generics)
- Added support for package and import declarations

These improvements enable more comprehensive code analysis for Java projects, providing better definition extraction and navigation capabilities.

Signed-off-by: Eric Wheeler <[email protected]>
…ures

This commit significantly enhances the Python tree-sitter parser to support a comprehensive range of Python language constructs, enabling more accurate and detailed code analysis.

Key improvements:
- Added support for method definitions (instance, class, and static methods)
- Added support for decorators on functions and classes
- Added support for module-level variables and constants
- Added support for async functions and methods
- Added support for property getters/setters
- Added support for type annotations in various contexts
- Added support for dataclasses
- Added support for nested functions and classes
- Added support for generator functions
- Added support for list/dict/set comprehensions
- Added support for lambda functions
- Added support for abstract base classes and methods

The parser now handles Python's rich feature set more comprehensively, including special Python patterns like decorators, type annotations, and various comprehension types. This enables better code navigation, understanding, and analysis for Python codebases.

Signed-off-by: Eric Wheeler <[email protected]>
@changeset-bot
Copy link

changeset-bot bot commented Apr 9, 2025

⚠️ No Changeset found

Latest commit: c2dda05

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Apr 9, 2025
@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Apr 9, 2025

The pull request enhances tree-sitter parsers for multiple languages, including C++, Go, Java, Python, TypeScript, and TSX. While the changes are extensive, they are all related to the same feature enhancement across different languages. Therefore, it is not necessary to split this pull request into smaller ones, as the changes are cohesive and contribute to a single feature enhancement.

@KJ7LNW
Copy link
Contributor Author

KJ7LNW commented Apr 9, 2025

this looks big but most of it is tests. these are the substantive changes:

]$ git diff --stat origin/main src/services/tree-sitter/queries/
 src/services/tree-sitter/queries/cpp.ts        |  85 +++++++++++-
 src/services/tree-sitter/queries/go.ts         |  51 +++++++
 src/services/tree-sitter/queries/java.ts       |  55 +++++++-
 src/services/tree-sitter/queries/python.ts     | 191 ++++++++++++++++++++++++++
 src/services/tree-sitter/queries/tsx.ts        |  41 +++++-
 src/services/tree-sitter/queries/typescript.ts |  32 +++++
 6 files changed, 448 insertions(+), 7 deletions(-)

@dosubot dosubot bot added the enhancement New feature or request label Apr 9, 2025
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Apr 9, 2025
@mrubens mrubens merged commit 63c8f92 into RooCodeInc:main Apr 9, 2025
21 checks passed
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Apr 9, 2025
SmartManoj pushed a commit to SmartManoj/Raa-Code that referenced this pull request May 6, 2025
* feat(bedrock): adding two regions

* changeset
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants